Adaptive Linear Quadratic Control Using Policy Iteration

نویسندگان

  • Steven J. Bradtke
  • B. Erik Ydstie
  • Andrew G. Barto
چکیده

In this paper we present stability and convergence results for Dynamic Programming-based reinforcement learning applied to Linear Quadratic Regulation (LQR). The spe-ciic algorithm we analyze is based on Q-learning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. The performance of the algorithm is illustrated by applying it to a model of a exible beam.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive linear quadratic control using policy iteration - American Control Conference, 1994

In this paper we present stability and convergence results for Dynamic Programming-based reinforcement learning applied to Linear Quadratic Regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first c...

متن کامل

Optimization of Markov jump linear system with controlled jump probabilities of modes

The optimal control p roblem of Markov jump linear quadratic model with controlled jump probabilities of modes is investigated. Two kinds of mode control policies , open2loop control policy and close2loop control policy , are considered. By using policy iteration and performance potential concept , a sufficient condition for the optimal close2 loop control policy being better than the optimal o...

متن کامل

Greedy Adaptive Critics for LQR Problems: Convergence Proofs

A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exempliied by the lack of convergence results for a number of important situa...

متن کامل

Optimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics

In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...

متن کامل

Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems

In this paper, the finite-horizon near optimal adaptive regulation of linear discrete-time systems with unknown system dynamics is presented in a forward-in-time manner by using adaptive dynamic programming and Q-learning. An adaptive estimator (AE) is introduced to relax the requirement of system dynamics, and it is tuned by using Q-learning. The time-varying solution to the Bellman equation i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994